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(54) Method and apparatus for video skimming 

(57) The present invention relates to a system for 
searching and browsing multimedia, and more particu- 
larly, to a video skimming method and apparatus which 
is capable of fully understanding the full content of video 
within a short time and rapidly moving to a desired por- 
tion by skimming the content of the video based on 



scenes and shots formed by shot clustering and shot 
segmentation, selecting scenes to be reproduced and 
scenes to be skipped when performing video skimming, 
and then continuously reproducing a particular portion 
in a shot of the scene to be reproduced or partially re- 
producing the same by a skipping technique. 
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Description 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 5 

[0001] The present invention relates to a system for 
searching and browsing multimedia, and more particu- 
larly, to a video skimming system which is capable of 
briefly understanding thefull content of video and rapidly n> 
moving to a desired portion based on a meaningful story 
structure according to the progress of the content of the 
video among structural information of the video content. 

2. Description of the Related Art « 

[0002] As mass media has developed and the produc- 
tion of multimedia contents has become easier, the 
quantity of media received by the general public every 
day has become enormous. As multimedia contents 20 
have become enormous, a request for an automatic sys- 
tem for sorting data desired by a user is generated and 
the study of methods for complying with this request is 
being made. Particularly, with the development of digital 
technology, there is a growing trend in which a video 25 
content is stored and distributed in a digital format. 
When digital broadcasting becomes popular, the digital- 
ization of media will be accelerated. 
[0003] With such a digital video content, a certain user 
may wish to view only sports-related news, or another 30 
user may wish to view securities-related news. In addi- 
tion, a certain user may request for viewing only scenes 
in which a particular person appears in a show program. 
In order to receive such various kinds of user requests, 
various studies are being made. 35 
[0004] Moreover, a user may request to grasp the full 
video contents within a limited time. Such a request is 
accepted by "Highlights". Generally, highlights can be 
understood as a newly configured content of important 
scenes from a video content. These includes, for exam- « 
pie, "Sports Highlights", "Preview of Movie", "Headline 
News" and the like. However, in current technologies, it 
is very difficult to automate the extraction of highlights 
from a video content. Thus, in most cases, this extrac- 
tion is dependent upon a manual work. As mentioned « 
above, as the quantity of media has been increased ex- 
plosively, many human powers are needed to manually 
provide highlights of every video content, which is al- 
most impossible. Therefore, an automation system is 
needed in order to allow a user to understand the outline so 
of the content within a short time. 
[0005] With the development of digital technologies, 
a key frame is used for use in moving to a desired po- 
sition in a video content. By using a video summary us- 
ing the key frame, a user can move to a desired person 55 
rapidly. A large number of key frames are needed in or- 
der to easily search for a desired portion by using the 
key frame, but it is difficult to display a large number of 



key frames in a restricted display space. Thus, the user 
is requested to perform many selection works. In addi- 
tion, generally, it is difficult to understand the full content 
of video by the method using a key frame. 
[0006] Recently, for searching for a desired scene in 
a digital video, various video indexing techniques are 
being studied. For a user wanting only scenes in which 
a particular person appears, the study of indexing infor- 
mation on the appearance of a person by the process 
of searching for a scene in which the person appears in 
a video and recognizing who the person is and the study 
of extracting principal scenes from a movie or sports and 
indexing the same are being made. However, the gen- 
res of video are very various and data to be indexed are 
very different by genres. Hence, it is known that it is very 
difficult to implement an automation system for extract- 
ing meaningful information with accuracy of high level 
by the current techniques. 

[0007] On the other hand, in digital video, unlike an- 
alog video, the degradation of image quality can be pre- 
vented when fast wind/fast rewind functions are execut- 
ed, 

[0008] As a fast reproduction method generally used 
in a digital video, a method for increasing a number of 
frames decoded per unit time and displaying parts there- 
of, or a method for decoding and displaying frames while 
skipping a certain portion is used. 
[0009] However, in the method for increasing the 
number of frames decoded per unit time, it is disadvan- 
tageous in that the maximum speed is affected by the 
performance of a terminal device. Thus, for the fast 
wind/fast rewind of a digital video, the method for de- 
coding and displaying frames while skipping a certain 
portion is used in general. The fast wind/fast rewind 
technique in the digital video is the most reasonable one 
of existing techniques for complying with the request of 
the user wanting to understand the full content within a 
restricted time or wanting to move to a desired portion. 
However, predetermined intervals of time are used in 
skipping a certain portion, and thus there is a disadvan- 
tage that the user misses the scene of a desired portion 
or a less important portion is reproduced relatively often. 

SUMMARY OF THE INVENTION 

[0010] A video skimming method according to the 
present invention includes the steps of: recognizing an 
individual shot section, a physical editing unit, as struc- 
tural information for a video stream by shot segmenta- 
tion; selecting a particular portion in the recognized in- 
dividual shot section as reproduction video information 
reflecting the content of the corresponding shot; and 
continuously reproducing the video information selected 
for the individual shot. 

[0011] Here, the video skimming method further in- 
cludes the shot selection step of determining shots to 
be reproduced and shots to be skipped after recognizing 
the individual shot section. 
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[0012] Here, the structural information on the video 
stream are scene information, i.e., a logical story unit, 
and shot information, i.e., a physical editing unit dis- 
played together with temporal technical information 
(starting position and duration or starting position and 5 
end position), which further including technical informa- 
tion on shot properties. 

[0013] Here, in the shot selection step, the effect of 
repetitively reproducing shots having similar properties 
is minimized by determining to skip parts of the shots 10 
having similar properties and to use only the remaining 
shots for skimming. 

[0014] Here, in selecting shots to be reproduced from 
the similar shots, shots to be used for skimming are se- 
lected by giving a higher weight value for selection to »5 
shots located at the latter half of a scene. 
[001 5] Here, as a reproduction portion (segment) rep- 
resentative of each shot, the front portion, rear portion 
and center portion of the corresponding shot or the front 
portion and rear portion thereof are used at the same 20 
time. The length of the reproduction portion (segment) 
representative of each shot is identical. 
[0016] Here, if the length of the segment selected as 
the reproduction portion in each shot is larger than the 
length of the corresponding shot, the length of the re- 25 
production portion in the individual shot is reduced to 
below the length of the corresponding shot. 
[0017] Here, based on the average value of the im- 
age/motion/audio similarities in the individual shot, the 
length of the reproduction portion (segment) represent- 30 
ative of each shot is reduced if there is a high similarity, 
or the reproduction length is increased if there is a low 
similarity. 

[0018] Then, the image/motion/audio similarities in 
the shot representative of the scene mean the similarity 35 
in frames, motion vectors and audio data with different 
time positions. 

[0019] Here, the reproduction speed of segments to 
be reproduced in each shot is controlled variably. 
[0020] Here, the reproduction section is reproduced <o 
at a high speed by increasing a number of frames to be 
decoded per unit time and then making the reproduction 
speed higher than a normal speed, or by skipping a few 
frames in the middle without decoding all frames in the 
reproduction section. 45 
[0021] Here, when the high speed skimming method 
using skipping is adapted to a video stream using a cod- 
ing scheme utilizing interframe compression such as 
MPEG, decoding frames are I frames which can obtain 
frame data by decoding only the corresponding frame so 
without decoding other frames. 
[0022] In addition, a video skimming apparatus ac- 
cording to the present invention includes: a user inter- 
face unit for inputting a user command for video skim- 
ming in order to search and browse digital video data as ss 
multimedia data; a control unit for skimming the corre- 
sponding video file based on structural information for 
video content according to the user command inputted 



from the user interface unit; a video information file for 
providing the structural information for the video content 
to the control unit as index information for the digital vid- 
eo data and the corresponding video; and a display unit 
for reproducing the video skimmed by the control unit. 
[0023] Here, the structural information for the video 
content includes shots representative of the corre- 
sponding scene, a logical story unit, that are based on 
scene information and are reproduced from shot infor- 
mation, a physical editing unit which is an element of the 
scene. 

[0024] Here, the structural information for the video 
content further includes segments to be reproduced in 
the shot representative of the corresponding scene. 
[0025] Here, the user interface unit includes a unit for 
designating a summary level as a degree of video skim- 
ming or a unit for designating the speed of a reproduc- 
tion section in video skimming in order to select the sum- 
mary level or reproduction speed of video in video skim- 
ming. 

[0026] Here, the control unit reads video index infor- 
mation related on shot segmentation information and 
shot clustering information from the index file according 
to a skimming condition by using a user input or basic 
settings, calculates segments to be reproduced con- 
forming to the video skimming condition, reproduces the 
corresponding segments from the related media file 
continuously, and outputs the same to the display unit. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0027] The above objects, features and advantages 
of the present invention will become more apparent from 
the following detailed description when taken in con- 
junction with the accompanying drawings, in which: 

Figure 1 is a view explaining the concept of shot 
segmentation and clustering; 
Figure 2 is a view explaining the concept of a video 
skimming method using shot segmentation infor- 
mation; 

Figure 3 is a view illustrating an example of a meth- 
od of transition of shots of an interactive scene; 
Figure 4 is a view illustrating an example of a scene 
detection method using shot properties; 
Figure 5 is a view illustrating an example of a meth- 
od for selecting shots to be reproduced and shots 
to be skipped in skimming using structural informa- 
tion; 

Figure 6 is a view explaining a method for selecting 
shots to be reproduced and shots to be skipped in 
consideration of the location of shots in a scene and 
repetitive information; 

Figure 7 is a view explaining a method for selecting 
a portion to be skipped and a portion to be repro- 
duced in a shot; 

Figure 8 is a view illustrating a method for selecting 
a dynamic unit reproduction length using the dis- 
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similarity of a shot; 

Figure 9 is a view explaining a quick skimming 
method using skipping; 

Figure 1 0 is a view explaining a skimming method 
using structural information of video content; and 
Figure 11 is a view illustrating one example of the 
configuration of a system for video skimming using 
structural information of video content. 

DETAILED DESCRIPTION OF THE PREFERRED 
EMBODIMENT 

[0028] With the development of digital video tech- 
niques and image/video recognition techniques, users 
has become to search/filter and browse only a desired 
portion of a desire video at a desired point of time. 
[0029] The most basic techniques for non-linear video 
browsing and searching are the shot segmentation tech- 
nique and the shot clustering technique. These two 
techniques are the most essential ones for analyzing 
video. Therefore, many studies have been concentrated 
on shot segmentation up to now, and the study of the 
shot clustering technique is being started. 
[0030] Based on the results of various studies, the 
shot segmentation can be automated, and most algo- 
rithms can be implemented with a high accuracy of more 
than 90%. 

[0031] In addition, the shot clustering also can be au- 
tomated with accuracy of high level by applying the tech- 
nique conforming to the genre of a program by detecting 
a characteristic event or using general characteristics of 
shots. 

[0032] Generally, a video contents is logically seg- 
mented into a several number of story units. Such a unit 
of a story structure is generally referred to as an event 
or scene, which including a gunfight scene, an interac- 
tive scene, etc. Such a scene is constructed a sequence 
of sub-scenes or shots. 

[0033] A shot denotes a sequence of video frames ob- 
tained from one camera without interruption, which is 
the most basic unit in video analysis or construction. 
[0034] Generally, a video is constructed of a se- 
quence of many shots. Shot segmentation denotes a 
method for segmenting video into individual shots. Shot 
clustering denotes a process for detecting a logical story 
structure of a video content by reconstructing shots in 
logical scene units based on each of the individual shots 
and the characteristics thereof. 
[0035] The thusly configured video skimming system 
using scene and shot information, i.e., structural infor- 
mation of video content according to the present inven- 
tion will now be described with reference to the accom- 
panying drawings. 

[0036] Figure 1 is a view illustrating a shot segmen- 
tation process and a shot clustering process. Generally, 
most shot segmentation algorithms are based on the 
feature that image/motion/audio similarity is present in 
the same shots and the image/motion/audio dissimilar- 



ity is found between two different shots, and most shot 
clustering algorithms are based on the feature that shots 
having similar characteristics are detected again within 
a predetermined time. 
5 [0037] Generally, video highlights are a method for 
selecting meaningful segments in the progress of the 
content of a video stream and continuously reproducing 
these segments. 

[0038] However, it is very difficult to automate the se- 
10 lection of meaningful segments in the progress of vari- 
ous video contents. 

[0039] Nevertheless, if shot segmentation information 
is used for video skimming, it is possible to implement 
a skimming method for reproducing only a certain por- 

15 tion of each of shots existing in every video and repro- 
ducing the remaining portion at a length smaller than 
that of the original stream by skipping. Such a skimming 
method is advantageous in that a complete automation 
system can be constructed since the shot segmentation 

20 technique can be automated, and in that the problem of 
reproducing an unimportant scene at a large length or 
missing an important scene generated in the fast wind/ 
fast rewind for general digital video can be reduced. 
[0040] Figure 2 is a view of a summary of a video 

25 skimming method using shot segmentation information. 
[0041] A portion shown in gray in Figure 2 indicates a 
portion to be reproduced in the skimming method using 
shot segmentation information, and the remaining por- 
tion indicates a portion to be skipped. 

30 [0042] However, in the case that only the shot seg- 
mentation information is used in video skimming, scene 
information, which is a logical story structure existing in 
video content, is not used, and therefore repetitive shots 
continue to be played in a particular event section such 

35 as an interactive scene. 

[0043] Figure 3 is a view illustrating an array structure 
of shots in a long interactive scene. In Figure 3, the shots 
each are represented as English capital letters (A, B, C, 
D) based on shot properties detected by the shot seg- 

io mentation process. 

[0044] In other words, the interactive scene repre- 
sented in Figure 3 is a scene in which character 1 and 
character 2 are viewed in close-up in turns, which being 
constructed of many shots. 

45 [0045] However, if the only shot segmentation infor- 
mation is used in video skimming, every certain portion 
of each of the shots in the interactive scene is repro- 
duced. Therefore, there is a disadvantage that this 
scene is reproduced for a long time though no other ad- 

50 ditional information excepting the information that two 
persons talk can be provided to a user. 
[0046] In the present invention, the above-mentioned 
disadvantage is overcome by performing video skim- 
ming in consideration of shot information as well as 

55 scene information as structural information of a video 
content. 

[0047] In other words, in the present invention, there 
is suggested a skimming method and apparatus for 
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picking out shots to be reproduced and shots to be 
skipped from shots of each scene existing in every vid- 
eo, reproducing only a certain portion (segment) from 
segment information constituting the shot to be repro- 
duced and reproducing the remaining portion at a length 
smaller than that of the original video stream by skip- 
ping. 

[0048] As the results of various studies, it is known 
that a scene of content such as a movie or drama can 
be detected dependent upon the fact that a particular 
event such as a gunfight scene, interactive scene, etc. 
can be detected, and thusly an index structure of a ToC 
(Table of Content) format can be automatically generat- 
ed. 

[0049] Figure 4 is a view illustrating a process of de- 
tecting a story unit for a general video content. 
[0050] Each shot is represented as an English capital 
letter based on shot properties detected by the shot seg- 
mentation process like Figure 3. In a shot transition 
structure in an interactive scene of a drama or movie, in 
most cases, a feature pattern of shots such as A, B, A, 
B, ... is shown. Figure 4 shows the process of determin- 
ing the corresponding section as one scene if shots hav- 
ing similar properties are detected within a predeter- 
mined period of time. In Figure 4, scene 1 consists of 
shots having a feature value of A, B, C. Shots having a 
feature value of A, B, C do not exist for a predetermined 
time since shot1-B3, and thus a scene is detected by 
detecting the finish point of time of shot1-B3 to be the 
finish point of time of scene 1 . In Figure 4, scene 2 con- 
sists of shots having a feature value of F, H, E. The fea- 
ture values F, H, E of the shots do not exist for a prede- 
termined time since the last shot of this scene, and thus 
the finish point of time of scene 2 can be detected. 
[0051] Besides this method, it is possible to detect a 
more accurate interactive scene by the process of face 
detection and face recognition. Such a method is usually 
adapted to general dramas or movies. 
[0052] As described above, the present invention im- 
plements video skimming by using scene and shot in- 
formation which are structural information for video con- 
tent, and considers how to select a shot to be repro- 
duced from shots of a scene, how to select a portion to 
be reproduced and a portion to be skipped from the shot 
selected as the shot to be reproduced, how to select a 
reproduction length of the portion to be reproduced and 
how to reproduce in a reproduction section. 
[0053] Firstly, Figure 5 is a view illustrating a summary 
of the video skimming method of the present invention. 
[0054] In Figure 5, structural information of video con- 
tent indexed by the shot segmentation process and the 
shot clustering process is used. In Figure 5, shots se- 
lected for reproduction in video skimming using struc- 
tural information are indicated in gray and shots to be 
skipped are indicated in white. That is, for the video 
skimming using structural information, a system firstly 
determines shots to be reproduced for each scene and 
determines method for reproducing the individual shot. 



[0055] Figure 5 is an example of reproducing only a 
remarkable scene only once among similar shots so that 
repetitive shots among shots of scene 1 cannot be re- 
produced. 

5 [0056] In the present invention, the shot selection for 
determining shots to be reproduced and shots to be 
skipped among shots of each scene exiting in a video 
stream will be achieved as follows. 
[0057] In a method for selecting a representative shot 

10 if many shots having similar properties exist in one 
scene, the outline of the content of the scene can be 
delivered by selecting the representative shot and using 
the same in skimming without any particular weight con- 
ditions. However, in the story structure such as general 

*5 dramas and movies, much more information is ex- 
pressed in the latter half of one scene. In other words, 
the introduction part is usually less important than the 
conclusion part. Therefore, in the step of selecting shots 
to be reproduced in skimming when similar shots appear 

20 many times in the scene, much more information can be 
provided to a user by selecting shots in the latter half of 
the scene as shots to be reproduced. 
[0058] Figure 6 illustrates a method (a of Figure 6) for 
selecting shots to be reproduced in the former half of a 

25 scene and a method (b of figure 6) for selecting shots 
to be reproduced in the latter half of the scene. 
[0059] Both a and b of Figure 6 are examples of se- 
lecting only one shot for skimming if similar shots exist 
in one scene. In a of Figure 6, shots appearing at the 

30 very beginning are selected as shots to be reproduced 
among shots having shot properties of A, B, C. In b of 
Figure 6, shots appearing at the very last are selected 
as shots to be reproduced among shots having shot 
properties of A, B, C. Generally, the method of b of Fig- 

35 ure 6 shows a higher user's satisfaction than the method 
of a of Figure 6. 

[0060] Next, the method for selecting a portion to be 
reproduced and a portion to be skipped in each shot will 
now be described. 

40 [0061] In skimming using structural information of vid- 
eo content, the summary of the video content can be 
provided by continuously reproducing the shots select- 
ed above. However, the video skimming method of play- 
ing the full shot provides a summary of a very low level 

45 in general. Usually, a user can understand the content 
of the full shot by viewing only parts of the shot. In the 
method for selecting a portion to be reproduced from the 
shot selected for reproduction in video skimming using 
structural information of video content, the front portion, 

50 rear portion or center portion of the shot can be selected 
unconditionally. Figure 7 is a view illustrating a portion 
to be skipped and a portion to be reproduced in the 
method for video skimming using the front, rear and 
center portions or the front/rear portions of a shot at the 

55 same time. 

[0062] As the result of a test, it is found that a higher 
user's satisfaction is achieved by skipping the front por- 
tion of the corresponding shot and reproducing the rear 
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portion thereof, though it is different according to the 
genre of video. The reasons of which are because the 
conclusion part (e.g., a goal scene in a soccer game) of 
the shot is more important than the introduction part or 
development part when understanding the content of 
the shot, and because parts of the content are ex- 
pressed in the former half of the shot and the full content 
is expressed in the latter half if a method such as a step- 
wise chart explanation is sued in a program like news. 
[0063] However, the front portion of the shot may be 
important according to the genre of video, for example, 
educational broadcasting mainly for solving questions. 
[0064] In such a broadcasting program, the informa- 
tion on what questions are dealt with is present at the 
front portion of the shot, and then the work of solving 
questions is continued since the front portion. Thus, in 
order to reproduce a desired portion, much more infor- 
mation can be provided to a user by reproducing the 
front portion of the shot, rather than by reproducing the 
rear portion. 

[0065] Therefore, in the present invention, the posi- 
tion to be reproduced in the shot can be selected differ- 
ently according to the characteristics of the content of 
video, and skimming can be implemented by using the 
front portion, the center portion and the rear portion in 
combination with one another in the same shot. 
[0066] Next, the method for selecting a reproduction 
length according to the present invention will now be de- 
scribed. 

[0067] The method for selecting a reproduction length 
in each shot can be divided into the method for selecting 
segments of the same length as a portion to be repro- 
duced for every selected shots and the method for se- 
lecting a different reproduction length for each shot by 
using a shot property. 

[0068] At this time, the above-used shot property is 
based on the average image/motion/audio similarities 
in one shot. That is, it can be judged that the larger the 
image/motion/audio similarities in one shot are, the 
more monotonous the scene is. IN such a scene, skip- 
ping is performed more often. On the contrary, it can be 
judged that the smaller the image/motion/audio similar- 
ities in the shot are, the more complicated the content 
of the scene is. In such a scene, the length of a unit 
segment to be reproduced can be adjusted dynamically 
by using the method for performing skipping less often. 
[0069] This method is a method for skipping a portion 
with much information less often and skipping a portion 
with a little information more often without depending 
upon the time length of the shot. By this method, video 
skimming with a user's comprehensibility of a high level 
can be provided as compared to the method for repro- 
ducing segments of the same length for every selected 
shots. 

[0070] Figure 8 is a view illustrating an example of a 
method for selecting a length to be reproduced and 
skipped based on image/motion/audio similarities in a 
shot. 



[0071] In a graph of Figure 8, a horizontal axis indi- 
cates time and a longitudinal axis indicates an accumu- 
lated value of image/motion/audio dissimilarities in the 
shot. These dissimilarity data are data representing shot 
5 properties extractable from a shot segmentation algo- 
rithm in general. 

[0072] As an example of dissimilarity, the difference 
in a color histogram variance between adjacent frames 
or between frames at predetermined intervals can be 
10 taken. 

[0073] In Figure 8, since the average rate of change 
of shot A is larger than that of shot B, though both shot 
A and shot B have a similar length, the circumstance in 
which more portions are reproduced in shot B than shot 

15 A is shown. 

[0074] In this way, unless the length of a shot is con- 
sidered in setting a reproduction section, an error situ- 
ation in which the length of a reproduction section be- 
comes larger than that of the corresponding shot may 

20 occur (if the shot is very short). Hence, in the skimming 
method of the present invention, in the case that the 
length of a unit section becomes larger than that of the 
corresponding shot, exceptionally, the full correspond- 
ing shot may be selected as a reproduction section or 

25 parts thereof may be selected as a reproduction section 
in consideration of the length of the corresponding shot. 
[0075] Next, the method for reproducing a scene and 
a reproduction section in a shot to be reproduced in the 
scene as structural information for video content will be 

30 explained. 

[0076] The video skimming method according to the 
present invention can be adapted to a backward direc- 
tion as well as a forward direction. 
[0077] When segments selected as reproduction sec- 

35 tions in each shot are continuously reproduced, a user 
can understand the full content and obtain outline infor- 
mation on content in a short time. Besides, any interfer- 
ence is not required for searching a desired position. 
[0078] In the video skimming method of the present 

40 invention, the method for reproducing segments select- 
ed as reproduction sections in each shot can be divided 
into two. 

[0079] A first method is one for reproducing each seg- 
ment, which is the same as a normal reproducing meth- 
45 od. A second method is one for decoding parts of frames 
in a reproduction section and reproducing the same in 
the section by using skipping. 

[0080] The normal reproducing method is very com- 
mon, so a detailed description thereof will be omitted. 

50 The method for decoding parts of frames in a reproduc- 
tion section and reproducing the same in the section by 
using skipping will now be described. 
[0081] The method for decoding parts of frames in a 
reproduction section and reproducing the same in the 

55 section by using skipping is a method for implementing 
quick skimming. At this time, frames to be displayed can 
be designated as frames at predetermined intervals of 
time. In the method using interframe compression such 



6 



11 



EP1 182 584 A2 



12 



as MPEG, I frames having no interframe independency 
can be designated. 

[0082] Figure 9 is a view illustrating an example of a 
quick skimming method using skipping in a reproduction 
section. By using this method, a user can experience 
the effect of obtaining much information and reproduc- 
ing a video file at a high speed. 
[0083] As explained above, in the video skimming 
method using structural information of video content of 
the present invention, segments are designated by two 
steps. Figure 10 is a view illustrating a summary of the 
video skimming method using structural information of 
video content according to the present invention. 
[0084] When video skimming is requested, the sys- 
tem loads an index file storing structural information of 
video content including shot and scene information on 
the video content. The system determines what shots 
to reproduce for each scene and what shots to skip (in 
the shot selection step), and determines segments to 
be reproduced and segments to be skipped for each 
scene selected for video skimming (in the segment des- 
ignation step). Through the two determination steps, 
segments to be reproduced are continuously outputted 
to a reproducing apparatus. 

[0085] Figure 10 is a view illustrating shots to be re- 
produced by the shot selection step shown in gray, in 
which only parts (segments) of a selected shot can be 
reproduced and the remaining portion can be skipped. 
[0086] Figure 11 is a view illustrating a skimming ap- 
paratus for video skimming according to one embodi- 
ment of the present invention. 

[0087] As illustrated in Figure 1 1 , the video skimming 
apparatus of the present invention includes a user inter- 
face unit 101 for inputting a user command such as a 
degree of video skimming and a speed to be used in 
skimming, a master control unit 102 for skimming a cor- 
responding video file based on indexing information on 
shots and scenes according to the user command input- 
ted into the user interface unit 101, a media file 103 for 
providing digital video stream information to the master 
control unit 102, an index file 104 for providing the in- 
dexing information on shots and scenes as structural in- 
formation corresponding to the media file, and a display 
device unit 105 for reproducing the video skimmed by 
the master control unit 102. 

[0088] In the video skimming system of the present 
invention of Figure 11 , the index file 104 can be included 
in the medial file 103. The display device unit 105 is an 
output device for displaying a video stream including a 
monitor, a speaker, etc. The user interface unit 101 is 
an inputting means for receiving an input of a user in- 
cluding a keyboard, a mouse, a remote control, buttons, 
etc. 

[0089] The media file 1 03 is a file storing video (audio) 
data, and the index file 104 is a file storing index infor- 
mation on video containing shot clustering information 
and shot segmentation information. 
[0090] The user requests for video skimming by using 



the user interface unit 101. 

[0091] When the video skimming is requested, a sum- 
mary level (degree of skimming) can be designated and 
also a speed to be used in skimming can be designated. 

5 That is, the user designates how many minutes it takes 
to compress the full video for viewing by using the user 
interface unit 101. The master control unit 102 deter- 
mines what portion of what shot will be reproduced for 
skimming based on the medial file 103 and the subse- 

10 quent information of the index file 1 02 according to the 
input of the user and determines at what speed each 
segment will be reproduced. By completing this proc- 
ess, the master control unit 102 can provides a video 
skimming function to the user by decoding the media file 

15 1 03 and displaying the corresponding frames on the dis- 
play device unit 105. 

[0092] As described above, the present invention has 
disclosed a video skimming method for simultaneously 
complying with a user request for understanding the full 

20 content and moving to a desired position within a re- 
stricted time under a digital video environment. 
[0093] In the present invention, the possibility of re- 
producing a less important portion relatively often or 
missing an actually desired scene is minimized and the 

25 possibility of repetitively reproducing an interactive 
scene or a particular scene in turns is minimized, which 
are the problems that can occur to the existing video 
skimming method. 

[0094] The video skimming method of the present in- 
30 vention is a method for minimizing the necessity of a 
user input according to a user request for moving to a 
desired position. 

[0095] By using the video skimming function of the 
present invention, the user can understand the full con- 
35 tent within a short time, cannot miss an important portion 
in understanding the full content, and can skip a boring 
portion easily. 

[0096] In addition, the user can use the video skim- 
ming method of the present invention when he or she 

40 wants to move to a desired position. This method is ad- 
vantageous in that it requires less user input requests 
as compared to the method using key frames. 
[0097] In conclusion, the present invention can be 
employed, for instance, for use in reproducing video 

« highlights, and can be utilized as a function of rapidly 
searching a desired scene while minimizing a user input 
request if it is used together with a high speed reproduc- 
ing method in reproducing reproduction sections of each 
shot. 

50 

Claims 

1. A video skimming method, comprising the steps of: 

55 

recognizing an individual shot section, a phys- 
ical editing unit, as structural information for a 
video stream by shot segmentation; 
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selecting a particular portion in the recognized 
individual shot section as reproduction video in- 
formation reflecting the content of the corre- 
sponding shot; and 

continuously reproducing the video information 
selected for the individual shot. 

2. The method of claim 1 , wherein the video skimming 
method further includes the shot selection step of 
determining shots to be reproduced and shots to be 
skipped after recognizing the individual shot sec- 
tion. 

3. The method of claim 1 , wherein the structural infor- 
mation on the video stream are scene information, 
i.e., a logical story unit, and shot information, i.e., a 
physical editing unit displayed together with tempo- 
ral technical information (starting position and du- 
ration or starting position and end position), which 
further including technical information on shot prop- 
erties. 

4. The method of claim 2, wherein, in the shot selec- 
tion step, the effect of repetitively reproducing shots 
having similar properties is minimized by determin- 
ing to skip parts of the shots having similar proper- 
ties and to use only the remaining shots for skim- 
ming. 

5. The method of claim 4, wherein, in selecting shots 
to be reproduced from the similar shots, shots to be 
used for skimming are selected by giving a higher 
weight value for selection to shots located at the lat- 
ter half of a scene. 

6. The method of claim 1 , wherein, as a reproduction 
portion (segment) representative of each shot, the 
front portion, rear portion and center portion of the 
corresponding shot or the front portion and rear por- 
tion thereof are used at the same time. 

7. The method of claim 1, wherein the length of the 
reproduction portion (segment) representative of 
each shot is identical. 

8. The method of claim 7, wherein, if the length of the 
segment selected as the reproduction portion in 
each shot is larger than the length of the corre- 
sponding shot, the length of the reproduction por- 
tion in the individual shot is reduced to below the 
length of the corresponding shot. 

9. The method of claim 1 , wherein, based on the av- 
erage value of the image/motion/audio similarities 
in the individual shot, the length of the reproduction 
portion (segment) representative of each shot is re- 
duced if there is a high similarity, or the reproduction 
length is increased if there is a low similarity. 



10. The method of claim 8, wherein the image/motion/ 
audio similarities in the shot representative of the 
scene mean the similarity in frames, motion vectors 
and audio data with different time positions. 

5 

11. The method of claim 9, wherein if the length of the 
segment selected as the reproduction portion in 
each shot is larger than the length of the corre- 
sponding shot, the length of the reproduction por- 

10 tion in the individual shot is reduced to below the 
length of the corresponding shot. 

12. The method of claim 1, wherein the reproduction 
speed of segments to be reproduced in each shot 

'5 is controlled variably. 

13. The method of claim 12, wherein the reproduction 
section is reproduced at a high speed by increasing 
a number of frames to be decoded per unit time and 

20 then making the reproduction speed higher than a 
normal speed. 

14. The method of claim 13, wherein the reproduction 
section is reproduced at a high speed by skipping 

25 a few frames in the middle without decoding all 
frames in the reproduction section. 

15. The method of claim 14, wherein, when the high 
speed skimming method using skipping is adapted 

30 to a video stream using a coding scheme utilizing 
interframe compression such as MPEG, decoding 
frames are I frames which can obtain frame data by 
decoding only the corresponding frame without de- 
coding other frames. 

35 

16. A video skimming apparatus, comprising: 

a user interface unit for inputting a user com- 
mand for video skimming in order to search and 

40 browse digital video data as multimedia data; 

a control unit for skimming the corresponding 
video file based on structural information for 
video content according to the user command 
inputted from the user interface unit; 

45 a video information file for providing the struc- 

tural information for the video content to the 
control unit as index information for the digital 
video data and the corresponding video; and 
a display unit for reproducing the video 

50 skimmed by the control unit. 

17. The apparatus of claim 16, wherein the structural 
information for the video content includes shots rep- 
resentative of the corresponding scene, a logical 

55 story unit, that are based on scene information and 
are reproduced from shot information, a physical 
editing unit which is an element of the scene. 
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18. The apparatus of claim 17, wherein the structural 
information for the video content further includes 
segments to be reproduced in the shot representa- 
tive of the corresponding scene. 

5 

19. The apparatus of claim 16, wherein, the user inter- 
face unit comprises a unit for designating a summa- 
ry level as a degree of video skimming or a unit for 
designating the speed of a reproduction section in 
video skimming in order to select the summary level »0 
or reproduction speed of video in video skimming. 

20. The apparatus of claim 16, wherein the control unit 
reads video index information related on shot seg- 
mentation information and shot clustering informa- *5 
tion from the index file according to a skimming con- 
dition by using a user input or basic settings, calcu- 
lates segments to be reproduced conforming to the 
video skimming condition, reproduces the corre- 
sponding segments from the related media file con- 20 
tinuously, and outputs the same to the display unit. 

21. A video skimming apparatus, comprising: 

a storage unit for storing digital video data, 25 
scene information which is logical story unit of 
a video content and shot information which is a 
physical editing unit of a video content; 
a detection unit for detecting shot information 
representative of a particular scene based on 30 
the scene information corresponding to the vid- 
eo data for video skimming; 
a selection unit for selecting segments to be re- 
produced and segments to be skipped in the 
detected shot; and 35 
a reproduction unit for continuously reading the 
selected segments to be reproduced from the 
storage unit and reproducing the same. 
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FIG. 3 
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FIG. 6A 
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FIG. 7 



Use the center 
portion of shot 



le rear i_ 
n of shot I 



Use the rear 
portion of sho 

Use the front/rear | 



5 front/rear i 
s of shot f~ 



12 



EP1 182 584 A2 




Sections to be reproduced in video skimming using shot information 



Sections to be reproduced 
in video skimming 



Frames to be reproduced 
in skimming in reproduction section 



13 



EP1 182 584 A2 



Segments to Segments to be 
be skipped actually reproduced 



Display device unit 



Master control unit 



User interface unit 



Index file Media file 



